{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "在这个教程中，你将会学到如何使用python的pandas包对出租车GPS数据进行数据清洗，识别出行OD\n",
    "\n",
    "<div class=\"alert alert-info\"><h2>提供的基础数据是：</h2><p>    数据：<br>  \n",
    "    1.出租车原始GPS数据(在data-sample文件夹下，原始数据集的抽样500辆车的数据)</p></div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[pandas包的简介](https://baike.baidu.com/item/pandas/17209606?fr=aladdin)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 读取数据"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "首先，读取出租车数据。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-18T04:51:53.552930Z",
     "start_time": "2020-01-18T04:51:52.397018Z"
    }
   },
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "#读取数据\n",
    "data = pd.read_csv(r'data-sample/TaxiData-Sample',header = None)\n",
    "#给数据命名列\n",
    "data.columns = ['VehicleNum', 'Stime', 'Lng', 'Lat', 'OpenStatus', 'Speed']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-18T04:51:58.299239Z",
     "start_time": "2020-01-18T04:51:58.271312Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>VehicleNum</th>\n",
       "      <th>Stime</th>\n",
       "      <th>Lng</th>\n",
       "      <th>Lat</th>\n",
       "      <th>OpenStatus</th>\n",
       "      <th>Speed</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>0</td>\n",
       "      <td>22271</td>\n",
       "      <td>22:54:04</td>\n",
       "      <td>114.167000</td>\n",
       "      <td>22.718399</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>1</td>\n",
       "      <td>22271</td>\n",
       "      <td>18:26:26</td>\n",
       "      <td>114.190598</td>\n",
       "      <td>22.647800</td>\n",
       "      <td>0</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>2</td>\n",
       "      <td>22271</td>\n",
       "      <td>18:35:18</td>\n",
       "      <td>114.201401</td>\n",
       "      <td>22.649700</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>3</td>\n",
       "      <td>22271</td>\n",
       "      <td>16:02:46</td>\n",
       "      <td>114.233498</td>\n",
       "      <td>22.725901</td>\n",
       "      <td>0</td>\n",
       "      <td>24</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>4</td>\n",
       "      <td>22271</td>\n",
       "      <td>21:41:17</td>\n",
       "      <td>114.233597</td>\n",
       "      <td>22.720900</td>\n",
       "      <td>0</td>\n",
       "      <td>19</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   VehicleNum     Stime         Lng        Lat  OpenStatus  Speed\n",
       "0       22271  22:54:04  114.167000  22.718399           0      0\n",
       "1       22271  18:26:26  114.190598  22.647800           0      4\n",
       "2       22271  18:35:18  114.201401  22.649700           0      0\n",
       "3       22271  16:02:46  114.233498  22.725901           0     24\n",
       "4       22271  21:41:17  114.233597  22.720900           0     19"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#显示数据的前5行\n",
    "data.head(5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "数据的格式：\n",
    "\n",
    ">VehicleNum —— 车牌  \n",
    "Stime —— 时间  \n",
    "Lng —— 经度  \n",
    "Lat —— 纬度  \n",
    "OpenStatus —— 是否有乘客(0没乘客，1有乘客)  \n",
    "Speed —— 速度  "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 基础的数据操作"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## DataFrame和Series"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "DataFrame和Series\n",
    "\n",
    "   > 当我们读一个数据的时候，我们读进来的就是DataFrame格式的数据表，而一个DataFrame中的每一列，则为一个Series  \n",
    "    也就是说，DataFrame由多个Series组成\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 87,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-18T04:52:25.713432Z",
     "start_time": "2020-01-18T04:52:25.708450Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "pandas.core.frame.DataFrame"
      ]
     },
     "execution_count": 87,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "type(data)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "如果我们想取DataFrame的某一列，想得到的是Series，那么直接用以下代码\n",
    "\n",
    "   > data[列名]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 88,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-18T04:52:32.097575Z",
     "start_time": "2020-01-18T04:52:32.090592Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "pandas.core.series.Series"
      ]
     },
     "execution_count": 88,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "type(data['Lng'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2019-09-06T09:22:43.642625Z",
     "start_time": "2019-09-06T09:22:43.638487Z"
    }
   },
   "source": [
    "如果我们想取DataFrame的某一列或者某几列，想得到的是DataFrame，那么直接用以下代码\n",
    "\n",
    "> data2[[列名,列名]]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 89,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-18T04:52:33.013124Z",
     "start_time": "2020-01-18T04:52:32.990186Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "pandas.core.frame.DataFrame"
      ]
     },
     "execution_count": 89,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "type(data[['Lng']])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 数据的筛选"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "数据的筛选:\n",
    "\n",
    "    在筛选数据的时候，我们一般用data[条件]的格式\n",
    "    其中的条件，是对data每一行数据的true和false布尔变量的Series"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "    例如，我们想得到车牌照为22271的所有数据\n",
    "    首先我们要获得一个布尔变量的Series，这个Series对应的是data的每一行，如果车牌照为\"粤B4H2K8\"则为true，不是则为false\n",
    "    这样子的Series很容易获得，只需要\n",
    "    data['VehicleNum']==22271"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 90,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-18T04:52:44.078571Z",
     "start_time": "2020-01-18T04:52:44.049646Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    True\n",
       "1    True\n",
       "2    True\n",
       "3    True\n",
       "4    True\n",
       "Name: VehicleNum, dtype: bool"
      ]
     },
     "execution_count": 90,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "(data['VehicleNum']==22271).head(5)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 92,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-18T04:52:51.723416Z",
     "start_time": "2020-01-18T04:52:51.688510Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>VehicleNum</th>\n",
       "      <th>Stime</th>\n",
       "      <th>Lng</th>\n",
       "      <th>Lat</th>\n",
       "      <th>OpenStatus</th>\n",
       "      <th>Speed</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>0</td>\n",
       "      <td>22271</td>\n",
       "      <td>22:54:04</td>\n",
       "      <td>114.167000</td>\n",
       "      <td>22.718399</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>1</td>\n",
       "      <td>22271</td>\n",
       "      <td>18:26:26</td>\n",
       "      <td>114.190598</td>\n",
       "      <td>22.647800</td>\n",
       "      <td>0</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>2</td>\n",
       "      <td>22271</td>\n",
       "      <td>18:35:18</td>\n",
       "      <td>114.201401</td>\n",
       "      <td>22.649700</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>3</td>\n",
       "      <td>22271</td>\n",
       "      <td>16:02:46</td>\n",
       "      <td>114.233498</td>\n",
       "      <td>22.725901</td>\n",
       "      <td>0</td>\n",
       "      <td>24</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>4</td>\n",
       "      <td>22271</td>\n",
       "      <td>21:41:17</td>\n",
       "      <td>114.233597</td>\n",
       "      <td>22.720900</td>\n",
       "      <td>0</td>\n",
       "      <td>19</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   VehicleNum     Stime         Lng        Lat  OpenStatus  Speed\n",
       "0       22271  22:54:04  114.167000  22.718399           0      0\n",
       "1       22271  18:26:26  114.190598  22.647800           0      4\n",
       "2       22271  18:35:18  114.201401  22.649700           0      0\n",
       "3       22271  16:02:46  114.233498  22.725901           0     24\n",
       "4       22271  21:41:17  114.233597  22.720900           0     19"
      ]
     },
     "execution_count": 92,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#得到车牌照为22271的所有数据\n",
    "data[data['VehicleNum']==22271].head(5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "如果我们想要对data删去所有牌照为22271的数据，所需要的操作也很简单：\n",
    "    \n",
    "> data[-(条件)]\n",
    "    \n",
    "注意，如果你不想要奇奇怪怪的bug出现，请按照我上面给的格式来筛选数据，建议不要用data.drop()来删数据，，data.drop()只在你想删除某一列的时候用"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 93,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-18T04:53:05.516849Z",
     "start_time": "2020-01-18T04:53:05.403154Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>VehicleNum</th>\n",
       "      <th>Stime</th>\n",
       "      <th>Lng</th>\n",
       "      <th>Lat</th>\n",
       "      <th>OpenStatus</th>\n",
       "      <th>Speed</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>1437</td>\n",
       "      <td>35807</td>\n",
       "      <td>01:53:46</td>\n",
       "      <td>113.809898</td>\n",
       "      <td>22.626801</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>1438</td>\n",
       "      <td>35807</td>\n",
       "      <td>01:43:46</td>\n",
       "      <td>113.813301</td>\n",
       "      <td>22.623600</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>1439</td>\n",
       "      <td>35807</td>\n",
       "      <td>01:14:15</td>\n",
       "      <td>113.847000</td>\n",
       "      <td>22.594700</td>\n",
       "      <td>0</td>\n",
       "      <td>41</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>1440</td>\n",
       "      <td>35807</td>\n",
       "      <td>02:01:41</td>\n",
       "      <td>113.852501</td>\n",
       "      <td>22.625700</td>\n",
       "      <td>0</td>\n",
       "      <td>22</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>1441</td>\n",
       "      <td>35807</td>\n",
       "      <td>01:01:59</td>\n",
       "      <td>113.897003</td>\n",
       "      <td>22.551901</td>\n",
       "      <td>0</td>\n",
       "      <td>42</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      VehicleNum     Stime         Lng        Lat  OpenStatus  Speed\n",
       "1437       35807  01:53:46  113.809898  22.626801           0      0\n",
       "1438       35807  01:43:46  113.813301  22.623600           0      0\n",
       "1439       35807  01:14:15  113.847000  22.594700           0     41\n",
       "1440       35807  02:01:41  113.852501  22.625700           0     22\n",
       "1441       35807  01:01:59  113.897003  22.551901           0     42"
      ]
     },
     "execution_count": 93,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#删除车牌照为22271的所有数据\n",
    "data[-(data['VehicleNum']==22271)].head(5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 获取/删除/定义DataFrame的某一列"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 94,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-18T04:53:06.731768Z",
     "start_time": "2020-01-18T04:53:06.688883Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Stime</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>0</td>\n",
       "      <td>22:54:04</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>1</td>\n",
       "      <td>18:26:26</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>2</td>\n",
       "      <td>18:35:18</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>3</td>\n",
       "      <td>16:02:46</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>4</td>\n",
       "      <td>21:41:17</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      Stime\n",
       "0  22:54:04\n",
       "1  18:26:26\n",
       "2  18:35:18\n",
       "3  16:02:46\n",
       "4  21:41:17"
      ]
     },
     "execution_count": 94,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#获取列'Stime',注意，此操作不会影响到data，你在操作后必须将得到的表重新赋值给data才有影响\n",
    "data[['Stime']].head(5)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 95,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-18T04:53:07.602784Z",
     "start_time": "2020-01-18T04:53:07.471136Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>VehicleNum</th>\n",
       "      <th>Stime</th>\n",
       "      <th>Lng</th>\n",
       "      <th>Lat</th>\n",
       "      <th>OpenStatus</th>\n",
       "      <th>Speed</th>\n",
       "      <th>Speed1</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>0</td>\n",
       "      <td>22271</td>\n",
       "      <td>22:54:04</td>\n",
       "      <td>114.167000</td>\n",
       "      <td>22.718399</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>1</td>\n",
       "      <td>22271</td>\n",
       "      <td>18:26:26</td>\n",
       "      <td>114.190598</td>\n",
       "      <td>22.647800</td>\n",
       "      <td>0</td>\n",
       "      <td>4</td>\n",
       "      <td>8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>2</td>\n",
       "      <td>22271</td>\n",
       "      <td>18:35:18</td>\n",
       "      <td>114.201401</td>\n",
       "      <td>22.649700</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>3</td>\n",
       "      <td>22271</td>\n",
       "      <td>16:02:46</td>\n",
       "      <td>114.233498</td>\n",
       "      <td>22.725901</td>\n",
       "      <td>0</td>\n",
       "      <td>24</td>\n",
       "      <td>48</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>4</td>\n",
       "      <td>22271</td>\n",
       "      <td>21:41:17</td>\n",
       "      <td>114.233597</td>\n",
       "      <td>22.720900</td>\n",
       "      <td>0</td>\n",
       "      <td>19</td>\n",
       "      <td>38</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   VehicleNum     Stime         Lng        Lat  OpenStatus  Speed  Speed1\n",
       "0       22271  22:54:04  114.167000  22.718399           0      0       0\n",
       "1       22271  18:26:26  114.190598  22.647800           0      4       8\n",
       "2       22271  18:35:18  114.201401  22.649700           0      0       0\n",
       "3       22271  16:02:46  114.233498  22.725901           0     24      48\n",
       "4       22271  21:41:17  114.233597  22.720900           0     19      38"
      ]
     },
     "execution_count": 95,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#定义列'Speed1'为Speed列的两倍,注意，此操作会影响到data\n",
    "data['Speed1']=data['Speed']*2\n",
    "#或者\n",
    "data.loc[:,'Speed1']=data['Speed']*2\n",
    "\n",
    "data.head(5)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 96,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-18T04:53:45.170336Z",
     "start_time": "2020-01-18T04:53:45.109499Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>VehicleNum</th>\n",
       "      <th>Lng</th>\n",
       "      <th>Lat</th>\n",
       "      <th>OpenStatus</th>\n",
       "      <th>Speed</th>\n",
       "      <th>Speed1</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>0</td>\n",
       "      <td>22271</td>\n",
       "      <td>114.167000</td>\n",
       "      <td>22.718399</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>1</td>\n",
       "      <td>22271</td>\n",
       "      <td>114.190598</td>\n",
       "      <td>22.647800</td>\n",
       "      <td>0</td>\n",
       "      <td>4</td>\n",
       "      <td>8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>2</td>\n",
       "      <td>22271</td>\n",
       "      <td>114.201401</td>\n",
       "      <td>22.649700</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>3</td>\n",
       "      <td>22271</td>\n",
       "      <td>114.233498</td>\n",
       "      <td>22.725901</td>\n",
       "      <td>0</td>\n",
       "      <td>24</td>\n",
       "      <td>48</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>4</td>\n",
       "      <td>22271</td>\n",
       "      <td>114.233597</td>\n",
       "      <td>22.720900</td>\n",
       "      <td>0</td>\n",
       "      <td>19</td>\n",
       "      <td>38</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   VehicleNum         Lng        Lat  OpenStatus  Speed  Speed1\n",
       "0       22271  114.167000  22.718399           0      0       0\n",
       "1       22271  114.190598  22.647800           0      4       8\n",
       "2       22271  114.201401  22.649700           0      0       0\n",
       "3       22271  114.233498  22.725901           0     24      48\n",
       "4       22271  114.233597  22.720900           0     19      38"
      ]
     },
     "execution_count": 96,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#删除列'Stime',注意，此操作不会影响到data，你在操作后必须将得到的表重新赋值给data才有影响\n",
    "data.drop(['Stime'],axis=1).head(5)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 97,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-18T04:53:48.056620Z",
     "start_time": "2020-01-18T04:53:47.972844Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>VehicleNum</th>\n",
       "      <th>Stime</th>\n",
       "      <th>Lng</th>\n",
       "      <th>Lat</th>\n",
       "      <th>OpenStatus</th>\n",
       "      <th>Speed</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>0</td>\n",
       "      <td>22271</td>\n",
       "      <td>22:54:04</td>\n",
       "      <td>114.167000</td>\n",
       "      <td>22.718399</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>1</td>\n",
       "      <td>22271</td>\n",
       "      <td>18:26:26</td>\n",
       "      <td>114.190598</td>\n",
       "      <td>22.647800</td>\n",
       "      <td>0</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>2</td>\n",
       "      <td>22271</td>\n",
       "      <td>18:35:18</td>\n",
       "      <td>114.201401</td>\n",
       "      <td>22.649700</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>3</td>\n",
       "      <td>22271</td>\n",
       "      <td>16:02:46</td>\n",
       "      <td>114.233498</td>\n",
       "      <td>22.725901</td>\n",
       "      <td>0</td>\n",
       "      <td>24</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>4</td>\n",
       "      <td>22271</td>\n",
       "      <td>21:41:17</td>\n",
       "      <td>114.233597</td>\n",
       "      <td>22.720900</td>\n",
       "      <td>0</td>\n",
       "      <td>19</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   VehicleNum     Stime         Lng        Lat  OpenStatus  Speed\n",
       "0       22271  22:54:04  114.167000  22.718399           0      0\n",
       "1       22271  18:26:26  114.190598  22.647800           0      4\n",
       "2       22271  18:35:18  114.201401  22.649700           0      0\n",
       "3       22271  16:02:46  114.233498  22.725901           0     24\n",
       "4       22271  21:41:17  114.233597  22.720900           0     19"
      ]
     },
     "execution_count": 97,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#删除列'Stime',注意，此操作不会影响到data，你在操作后必须将得到的表重新赋值给data才有影响\n",
    "#axis=1表示的是，对列进行删除，如果axis=0，则是对行删除，但是建议不用这个功能对行删除\n",
    "data = data.drop(['Speed1'],axis=1)\n",
    "data.head(5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 获取某一列某一行的数据"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "在获取某行某列的数据时，记得一定要用iloc(按表目前排列的顺序取)，不能用loc(按index取)  \n",
    "因为很多时候我们做完筛选、排序等操作，表就不是按index来排列，用loc取列就会取错列，或者直接报错（没有这个index）"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 98,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-18T04:53:50.234795Z",
     "start_time": "2020-01-18T04:53:50.226816Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'16:02:46'"
      ]
     },
     "execution_count": 98,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#获取Stime列的第4行数据\n",
    "data['Stime'].iloc[3]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 数据清洗"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "首先，我们将数据按车牌、时间排序，特别要注意，data排序后需要再赋值给data，否则没作用"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 99,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-18T04:53:52.863800Z",
     "start_time": "2020-01-18T04:53:51.815569Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>VehicleNum</th>\n",
       "      <th>Stime</th>\n",
       "      <th>Lng</th>\n",
       "      <th>Lat</th>\n",
       "      <th>OpenStatus</th>\n",
       "      <th>Speed</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>39</td>\n",
       "      <td>22271</td>\n",
       "      <td>00:00:49</td>\n",
       "      <td>114.266502</td>\n",
       "      <td>22.728201</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>397</td>\n",
       "      <td>22271</td>\n",
       "      <td>00:01:48</td>\n",
       "      <td>114.266502</td>\n",
       "      <td>22.728201</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>1413</td>\n",
       "      <td>22271</td>\n",
       "      <td>00:02:47</td>\n",
       "      <td>114.266502</td>\n",
       "      <td>22.728201</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>244</td>\n",
       "      <td>22271</td>\n",
       "      <td>00:03:46</td>\n",
       "      <td>114.266502</td>\n",
       "      <td>22.728201</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>247</td>\n",
       "      <td>22271</td>\n",
       "      <td>00:04:45</td>\n",
       "      <td>114.268898</td>\n",
       "      <td>22.729500</td>\n",
       "      <td>0</td>\n",
       "      <td>11</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      VehicleNum     Stime         Lng        Lat  OpenStatus  Speed\n",
       "39         22271  00:00:49  114.266502  22.728201           0      0\n",
       "397        22271  00:01:48  114.266502  22.728201           0      0\n",
       "1413       22271  00:02:47  114.266502  22.728201           0      0\n",
       "244        22271  00:03:46  114.266502  22.728201           0      0\n",
       "247        22271  00:04:45  114.268898  22.729500           0     11"
      ]
     },
     "execution_count": 99,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 将数据排序,并把排序后的数据赋值给原来的数据\n",
    "data = data.sort_values(by = ['VehicleNum','Stime'])\n",
    "data.head(5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "我们现在要做的是，用出租车GPS数据识别OD：\n",
    "数据按车牌、时间排序后，正常情况下的OpenStatus是这样的：\n",
    "\n",
    "|OpenStatus |  |\n",
    "| :-----------: |-----------|\n",
    "|0||\n",
    "|0||\n",
    "|0||\n",
    "|0||\n",
    "|0||\n",
    "|1|←此时乘客上车了|\n",
    "|1||\n",
    "|1||\n",
    "|1||\n",
    "|1||\n",
    "|1||\n",
    "|1||\n",
    "|0|     ←此时乘客下车了|\n",
    "|0||\n",
    "|0||\n",
    "|0||\n",
    "\n",
    "但是，也会有时候有数据异常出现，比如：\n",
    "\n",
    "|OpenStatus |  |\n",
    "| ----------- |-----------|\n",
    "|0||\n",
    "|0||\n",
    "|0||\n",
    "|0||\n",
    "|0||\n",
    "|1|←异常|\n",
    "|0||\n",
    "|0||\n",
    "|0||\n",
    "|0||\n",
    "\n",
    "或者\n",
    "\n",
    "|OpenStatus |  |\n",
    "| ----------- |-----------|\n",
    "|1||\n",
    "|1||\n",
    "|1||\n",
    "|1||\n",
    "|1||\n",
    "|0|←异常|\n",
    "|1||\n",
    "|1||\n",
    "|1||\n",
    "|1||\n",
    "\n",
    "前后都是0，突然有一条数据变成1，或者前后都是1，突然变成0。这种异常情况我们是要排除的"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "在pandas的数据处理过程中，我们筛掉不要的数据用下面的方法是最好的\n",
    "\n",
    "   > data[条件]是保留符合条件的数据  \n",
    "    data[-(条件)]是删除符合条件的数据"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "    Series的shift()函数能够将数据按顺序后移一位\n",
    "    Series的shift(-1)函数能够将数据按顺序前移一位\n",
    "    因此我们要判断的是，如果：\n",
    "    后一位和前一位相等，但是后一位与中间一位不等，那么中间一位的数据就要删除（前一条数据，中间一条数据，后一条数据的车牌必须相等）\n",
    "    \n",
    "所以，我们立马开始筛选，把异常数据删除："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 100,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-18T04:53:55.724226Z",
     "start_time": "2020-01-18T04:53:55.717245Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1601307"
      ]
     },
     "execution_count": 100,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#筛选前的数据量\n",
    "len(data)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 101,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-18T04:53:56.442306Z",
     "start_time": "2020-01-18T04:53:56.438317Z"
    }
   },
   "outputs": [],
   "source": [
    "###############################你需要在下面写代码##################################\n",
    "#这里，把上述异常数据清洗出来\n",
    "#用到的条件是：\n",
    "#1.后一位和前一位相等\n",
    "#2.但是后一位与中间一位不等\n",
    "#3.前一条数据，后一条数据的车牌相等\n",
    "#4.中间一条数据，后一条数据的车牌相等\n",
    "\n",
    "#各条件之间的交集可以用\n",
    "#data[(条件1)&(条件2)]\n",
    "#各条件之间的并集可以用\n",
    "#data[(条件1)|(条件2)]\n",
    "\n",
    "\n",
    "#data = \n",
    "\n",
    "\n",
    "###################################################################################"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 102,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-18T04:53:57.340933Z",
     "start_time": "2020-01-18T04:53:57.057662Z"
    }
   },
   "outputs": [],
   "source": [
    "###############################     答    案    ##################################\n",
    "data = data[-((data['OpenStatus'].shift(-1) == data['OpenStatus'].shift())&\n",
    "(data['OpenStatus'].shift(-1) != data['OpenStatus'])&\n",
    "(data['VehicleNum'].shift(-1) == data['VehicleNum'].shift())&\n",
    "(data['VehicleNum'].shift(-1) == data['VehicleNum']))]\n",
    "\n",
    "###################################################################################"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 103,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-18T04:53:58.198611Z",
     "start_time": "2020-01-18T04:53:58.191630Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1598866"
      ]
     },
     "execution_count": 103,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#如果你代码对的话，筛选完了data的数据量应该是\n",
    "len(data)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 识别OD"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 乘客上下车的状态变化识别"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 104,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-18T04:54:00.064621Z",
     "start_time": "2020-01-18T04:54:00.050659Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>VehicleNum</th>\n",
       "      <th>Stime</th>\n",
       "      <th>Lng</th>\n",
       "      <th>Lat</th>\n",
       "      <th>OpenStatus</th>\n",
       "      <th>Speed</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>39</td>\n",
       "      <td>22271</td>\n",
       "      <td>00:00:49</td>\n",
       "      <td>114.266502</td>\n",
       "      <td>22.728201</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>397</td>\n",
       "      <td>22271</td>\n",
       "      <td>00:01:48</td>\n",
       "      <td>114.266502</td>\n",
       "      <td>22.728201</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>1413</td>\n",
       "      <td>22271</td>\n",
       "      <td>00:02:47</td>\n",
       "      <td>114.266502</td>\n",
       "      <td>22.728201</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>244</td>\n",
       "      <td>22271</td>\n",
       "      <td>00:03:46</td>\n",
       "      <td>114.266502</td>\n",
       "      <td>22.728201</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>247</td>\n",
       "      <td>22271</td>\n",
       "      <td>00:04:45</td>\n",
       "      <td>114.268898</td>\n",
       "      <td>22.729500</td>\n",
       "      <td>0</td>\n",
       "      <td>11</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      VehicleNum     Stime         Lng        Lat  OpenStatus  Speed\n",
       "39         22271  00:00:49  114.266502  22.728201           0      0\n",
       "397        22271  00:01:48  114.266502  22.728201           0      0\n",
       "1413       22271  00:02:47  114.266502  22.728201           0      0\n",
       "244        22271  00:03:46  114.266502  22.728201           0      0\n",
       "247        22271  00:04:45  114.268898  22.729500           0     11"
      ]
     },
     "execution_count": 104,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data.head(5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "接下来，我们把下一条数据的信息放到前一条数据上，这样子，就能很方便的比较这条数据和下条数据的差异\n",
    "\n",
    "在字段名加个1，代表后面一条数据的值"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "另外我们定义StatusChange为下一条数据的OpenStatus减去这一条数据的OpenStatus，这样就会出现：\n",
    "\n",
    "\n",
    "\n",
    "\n",
    "|OpenStatus     |   OpenStatus1    |  StatusChange||\n",
    "| ----------- |-----------|||\n",
    "|0          |       0    |             0||\n",
    "|0          |       0    |             0||\n",
    "|0         |        0    |             0||\n",
    "|0          |       1    |             1 |    ←此时乘客上车了|\n",
    "|1          |       1    |             0  |   ←此时乘客上车了|\n",
    "|1          |       1    |             0||\n",
    "|1          |       1    |             0||\n",
    "|1          |       1    |             0||\n",
    "|1          |       1    |             0||\n",
    "|1          |       1    |             0||\n",
    "|1          |       1    |             0||\n",
    "|1          |       1    |             0||\n",
    "|1          |       1    |             0||\n",
    "|1          |       0    |             -1|    ←此时乘客下车了|\n",
    "|0          |       0    |             0  |  ←此时乘客下车了|\n",
    "|0          |       0    |             0||\n",
    "|0          |       0    |             0||\n",
    "|0          |       0    |             0||\n",
    "\n",
    "注意到，乘客上车和下车都有两条数据，一般我们认为这两条数据的位置和时间非常接近，都可以认为是下车或者上车的地点"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 105,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-18T04:54:01.994461Z",
     "start_time": "2020-01-18T04:54:01.989475Z"
    }
   },
   "outputs": [],
   "source": [
    "###############################你需要在下面写代码##################################\n",
    "#让这几个字段的下一条数据赋值给新的字段，在字段名加个1，代表后面一条数据的值\n",
    "#data.loc[:,'OpenStatus1'] = \n",
    "#data.loc[:,'VehicleNum1'] = \n",
    "#data.loc[:,'Lng1'] = \n",
    "#data.loc[:,'Lat1'] = \n",
    "#data.loc[:,'Stime1'] = \n",
    "\n",
    "#data.loc[:,'StatusChange'] = \n",
    "\n",
    "\n",
    "###################################################################################"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 106,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-18T04:54:03.862529Z",
     "start_time": "2020-01-18T04:54:02.623838Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>VehicleNum</th>\n",
       "      <th>Stime</th>\n",
       "      <th>Lng</th>\n",
       "      <th>Lat</th>\n",
       "      <th>OpenStatus</th>\n",
       "      <th>Speed</th>\n",
       "      <th>OpenStatus1</th>\n",
       "      <th>VehicleNum1</th>\n",
       "      <th>Lng1</th>\n",
       "      <th>Lat1</th>\n",
       "      <th>Stime1</th>\n",
       "      <th>StatusChange</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>39</td>\n",
       "      <td>22271</td>\n",
       "      <td>00:00:49</td>\n",
       "      <td>114.266502</td>\n",
       "      <td>22.728201</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>22271.0</td>\n",
       "      <td>114.266502</td>\n",
       "      <td>22.728201</td>\n",
       "      <td>00:01:48</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>397</td>\n",
       "      <td>22271</td>\n",
       "      <td>00:01:48</td>\n",
       "      <td>114.266502</td>\n",
       "      <td>22.728201</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>22271.0</td>\n",
       "      <td>114.266502</td>\n",
       "      <td>22.728201</td>\n",
       "      <td>00:02:47</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>1413</td>\n",
       "      <td>22271</td>\n",
       "      <td>00:02:47</td>\n",
       "      <td>114.266502</td>\n",
       "      <td>22.728201</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>22271.0</td>\n",
       "      <td>114.266502</td>\n",
       "      <td>22.728201</td>\n",
       "      <td>00:03:46</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>244</td>\n",
       "      <td>22271</td>\n",
       "      <td>00:03:46</td>\n",
       "      <td>114.266502</td>\n",
       "      <td>22.728201</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>22271.0</td>\n",
       "      <td>114.268898</td>\n",
       "      <td>22.729500</td>\n",
       "      <td>00:04:45</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>247</td>\n",
       "      <td>22271</td>\n",
       "      <td>00:04:45</td>\n",
       "      <td>114.268898</td>\n",
       "      <td>22.729500</td>\n",
       "      <td>0</td>\n",
       "      <td>11</td>\n",
       "      <td>0.0</td>\n",
       "      <td>22271.0</td>\n",
       "      <td>114.272003</td>\n",
       "      <td>22.731199</td>\n",
       "      <td>00:05:44</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      VehicleNum     Stime         Lng        Lat  OpenStatus  Speed  \\\n",
       "39         22271  00:00:49  114.266502  22.728201           0      0   \n",
       "397        22271  00:01:48  114.266502  22.728201           0      0   \n",
       "1413       22271  00:02:47  114.266502  22.728201           0      0   \n",
       "244        22271  00:03:46  114.266502  22.728201           0      0   \n",
       "247        22271  00:04:45  114.268898  22.729500           0     11   \n",
       "\n",
       "      OpenStatus1  VehicleNum1        Lng1       Lat1    Stime1  StatusChange  \n",
       "39            0.0      22271.0  114.266502  22.728201  00:01:48           0.0  \n",
       "397           0.0      22271.0  114.266502  22.728201  00:02:47           0.0  \n",
       "1413          0.0      22271.0  114.266502  22.728201  00:03:46           0.0  \n",
       "244           0.0      22271.0  114.268898  22.729500  00:04:45           0.0  \n",
       "247           0.0      22271.0  114.272003  22.731199  00:05:44           0.0  "
      ]
     },
     "execution_count": 106,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "###############################     答    案    ##################################\n",
    "#让这几个字段的下一条数据赋值给新的字段，在字段名加个1，代表后面一条数据的值\n",
    "data.loc[:,'OpenStatus1'] = data['OpenStatus'].shift(-1)\n",
    "data.loc[:,'VehicleNum1'] = data['VehicleNum'].shift(-1)\n",
    "data.loc[:,'Lng1'] = data['Lng'].shift(-1)\n",
    "data.loc[:,'Lat1'] = data['Lat'].shift(-1)\n",
    "data.loc[:,'Stime1'] = data['Stime'].shift(-1)\n",
    "\n",
    "data.loc[:,'StatusChange'] = data['OpenStatus1']-data['OpenStatus']\n",
    "\n",
    "\n",
    "###################################################################################\n",
    "data.head(5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 将上下车状态整理为OD"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "    这里，我只想保留StatusChange字段为1或者-1的数据，因此要把这些数据筛出来\n",
    "    不过，还得加一个条件，就是这条数据和下一条数据的车辆ID必须是相同的"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 107,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-18T04:54:05.800844Z",
     "start_time": "2020-01-18T04:54:05.795858Z"
    }
   },
   "outputs": [],
   "source": [
    "###############################你需要在下面写代码##################################\n",
    "#两个条件：\n",
    "#1.StatusChange字段为1或者-1\n",
    "#2.这条数据和下一条数据的车辆ID必须是相同的\n",
    "#data=\n",
    "\n",
    "###################################################################################"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 108,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-18T04:54:06.863033Z",
     "start_time": "2020-01-18T04:54:06.800172Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>VehicleNum</th>\n",
       "      <th>Stime</th>\n",
       "      <th>Lng</th>\n",
       "      <th>Lat</th>\n",
       "      <th>OpenStatus</th>\n",
       "      <th>Speed</th>\n",
       "      <th>OpenStatus1</th>\n",
       "      <th>VehicleNum1</th>\n",
       "      <th>Lng1</th>\n",
       "      <th>Lat1</th>\n",
       "      <th>Stime1</th>\n",
       "      <th>StatusChange</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>1548741</td>\n",
       "      <td>22334</td>\n",
       "      <td>00:00:52</td>\n",
       "      <td>114.111130</td>\n",
       "      <td>22.576750</td>\n",
       "      <td>1</td>\n",
       "      <td>13</td>\n",
       "      <td>0.0</td>\n",
       "      <td>22334.0</td>\n",
       "      <td>114.111130</td>\n",
       "      <td>22.576750</td>\n",
       "      <td>00:01:04</td>\n",
       "      <td>-1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>1548351</td>\n",
       "      <td>22334</td>\n",
       "      <td>00:07:44</td>\n",
       "      <td>114.080498</td>\n",
       "      <td>22.554182</td>\n",
       "      <td>0</td>\n",
       "      <td>11</td>\n",
       "      <td>1.0</td>\n",
       "      <td>22334.0</td>\n",
       "      <td>114.080498</td>\n",
       "      <td>22.554182</td>\n",
       "      <td>00:07:57</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>1549620</td>\n",
       "      <td>22334</td>\n",
       "      <td>00:17:58</td>\n",
       "      <td>114.084915</td>\n",
       "      <td>22.540850</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>0.0</td>\n",
       "      <td>22334.0</td>\n",
       "      <td>114.084915</td>\n",
       "      <td>22.540850</td>\n",
       "      <td>00:18:16</td>\n",
       "      <td>-1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>1547182</td>\n",
       "      <td>22334</td>\n",
       "      <td>00:18:56</td>\n",
       "      <td>114.084915</td>\n",
       "      <td>22.540850</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>22334.0</td>\n",
       "      <td>114.084915</td>\n",
       "      <td>22.540850</td>\n",
       "      <td>00:19:05</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>1547627</td>\n",
       "      <td>22334</td>\n",
       "      <td>00:44:47</td>\n",
       "      <td>114.056236</td>\n",
       "      <td>22.633383</td>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "      <td>0.0</td>\n",
       "      <td>22334.0</td>\n",
       "      <td>114.056236</td>\n",
       "      <td>22.633383</td>\n",
       "      <td>00:44:52</td>\n",
       "      <td>-1.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         VehicleNum     Stime         Lng        Lat  OpenStatus  Speed  \\\n",
       "1548741       22334  00:00:52  114.111130  22.576750           1     13   \n",
       "1548351       22334  00:07:44  114.080498  22.554182           0     11   \n",
       "1549620       22334  00:17:58  114.084915  22.540850           1      2   \n",
       "1547182       22334  00:18:56  114.084915  22.540850           0      0   \n",
       "1547627       22334  00:44:47  114.056236  22.633383           1      3   \n",
       "\n",
       "         OpenStatus1  VehicleNum1        Lng1       Lat1    Stime1  \\\n",
       "1548741          0.0      22334.0  114.111130  22.576750  00:01:04   \n",
       "1548351          1.0      22334.0  114.080498  22.554182  00:07:57   \n",
       "1549620          0.0      22334.0  114.084915  22.540850  00:18:16   \n",
       "1547182          1.0      22334.0  114.084915  22.540850  00:19:05   \n",
       "1547627          0.0      22334.0  114.056236  22.633383  00:44:52   \n",
       "\n",
       "         StatusChange  \n",
       "1548741          -1.0  \n",
       "1548351           1.0  \n",
       "1549620          -1.0  \n",
       "1547182           1.0  \n",
       "1547627          -1.0  "
      ]
     },
     "execution_count": 108,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "###############################     答    案    ##################################\n",
    "data = data[((data['StatusChange'] == 1)|(data['StatusChange'] == -1))\n",
    "&(data['VehicleNum'] == data['VehicleNum1'])]\n",
    "\n",
    "###################################################################################\n",
    "data.head(5)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 109,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-18T04:54:07.783542Z",
     "start_time": "2020-01-18T04:54:07.764594Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>VehicleNum</th>\n",
       "      <th>Stime</th>\n",
       "      <th>Lng</th>\n",
       "      <th>Lat</th>\n",
       "      <th>StatusChange</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>1548741</td>\n",
       "      <td>22334</td>\n",
       "      <td>00:00:52</td>\n",
       "      <td>114.111130</td>\n",
       "      <td>22.576750</td>\n",
       "      <td>-1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>1548351</td>\n",
       "      <td>22334</td>\n",
       "      <td>00:07:44</td>\n",
       "      <td>114.080498</td>\n",
       "      <td>22.554182</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>1549620</td>\n",
       "      <td>22334</td>\n",
       "      <td>00:17:58</td>\n",
       "      <td>114.084915</td>\n",
       "      <td>22.540850</td>\n",
       "      <td>-1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>1547182</td>\n",
       "      <td>22334</td>\n",
       "      <td>00:18:56</td>\n",
       "      <td>114.084915</td>\n",
       "      <td>22.540850</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>1547627</td>\n",
       "      <td>22334</td>\n",
       "      <td>00:44:47</td>\n",
       "      <td>114.056236</td>\n",
       "      <td>22.633383</td>\n",
       "      <td>-1.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         VehicleNum     Stime         Lng        Lat  StatusChange\n",
       "1548741       22334  00:00:52  114.111130  22.576750          -1.0\n",
       "1548351       22334  00:07:44  114.080498  22.554182           1.0\n",
       "1549620       22334  00:17:58  114.084915  22.540850          -1.0\n",
       "1547182       22334  00:18:56  114.084915  22.540850           1.0\n",
       "1547627       22334  00:44:47  114.056236  22.633383          -1.0"
      ]
     },
     "execution_count": 109,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#data数据只保留一些我们需要的字段\n",
    "data = data[['VehicleNum','Stime','Lng','Lat','StatusChange']]\n",
    "data.head(5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "    我们现在就得到了乘客哪里上车，哪里下车。\n",
    "    而我们想要得到的OD数据形式是，每一行记录包括了信息：车辆ID，上车时间，上车地点，下车时间，下车地点\n",
    "    这样一行数据就是一个OD\n",
    "    所以，接下来的操作就是。。。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 111,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-18T04:54:15.682458Z",
     "start_time": "2020-01-18T04:54:15.669456Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>VehicleNum</th>\n",
       "      <th>Stime</th>\n",
       "      <th>SLng</th>\n",
       "      <th>SLat</th>\n",
       "      <th>ELng</th>\n",
       "      <th>ELat</th>\n",
       "      <th>Etime</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>1548351</td>\n",
       "      <td>22334</td>\n",
       "      <td>00:07:44</td>\n",
       "      <td>114.080498</td>\n",
       "      <td>22.554182</td>\n",
       "      <td>114.084915</td>\n",
       "      <td>22.540850</td>\n",
       "      <td>00:17:58</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>1547182</td>\n",
       "      <td>22334</td>\n",
       "      <td>00:18:56</td>\n",
       "      <td>114.084915</td>\n",
       "      <td>22.540850</td>\n",
       "      <td>114.056236</td>\n",
       "      <td>22.633383</td>\n",
       "      <td>00:44:47</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>1547511</td>\n",
       "      <td>22334</td>\n",
       "      <td>02:38:35</td>\n",
       "      <td>114.091637</td>\n",
       "      <td>22.543200</td>\n",
       "      <td>114.093498</td>\n",
       "      <td>22.554382</td>\n",
       "      <td>02:46:52</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>1547789</td>\n",
       "      <td>22334</td>\n",
       "      <td>03:58:46</td>\n",
       "      <td>114.038818</td>\n",
       "      <td>22.553232</td>\n",
       "      <td>114.052299</td>\n",
       "      <td>22.604366</td>\n",
       "      <td>04:13:57</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>1547764</td>\n",
       "      <td>22334</td>\n",
       "      <td>06:30:11</td>\n",
       "      <td>114.031250</td>\n",
       "      <td>22.519550</td>\n",
       "      <td>114.067886</td>\n",
       "      <td>22.521299</td>\n",
       "      <td>06:41:19</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         VehicleNum     Stime        SLng       SLat        ELng       ELat  \\\n",
       "1548351       22334  00:07:44  114.080498  22.554182  114.084915  22.540850   \n",
       "1547182       22334  00:18:56  114.084915  22.540850  114.056236  22.633383   \n",
       "1547511       22334  02:38:35  114.091637  22.543200  114.093498  22.554382   \n",
       "1547789       22334  03:58:46  114.038818  22.553232  114.052299  22.604366   \n",
       "1547764       22334  06:30:11  114.031250  22.519550  114.067886  22.521299   \n",
       "\n",
       "            Etime  \n",
       "1548351  00:17:58  \n",
       "1547182  00:44:47  \n",
       "1547511  02:46:52  \n",
       "1547789  04:13:57  \n",
       "1547764  06:41:19  "
      ]
     },
     "execution_count": 111,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "###############################你需要在下面写代码##################################\n",
    "\n",
    "\n",
    "\n",
    "\n",
    "###################################################################################\n",
    "data.head(5)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 110,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-18T04:54:12.768242Z",
     "start_time": "2020-01-18T04:54:12.732310Z"
    }
   },
   "outputs": [],
   "source": [
    "###############################     答    案    ##################################\n",
    "data = data.rename(columns = {'Lng':'SLng','Lat':'SLat'})\n",
    "data['ELng'] = data['SLng'].shift(-1)\n",
    "data['ELat'] = data['SLat'].shift(-1)\n",
    "data['Etime'] = data['Stime'].shift(-1)\n",
    "data = data[data['StatusChange'] == 1]\n",
    "data = data.drop('StatusChange',axis = 1)\n",
    "###################################################################################"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "    大功告成！接下来就是保存  \n",
    "    数据文件夹下有一个TaxiOD.csv文件是我用1.7GB的全部数据计算的OD"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-18T04:20:14.690034Z",
     "start_time": "2020-01-18T04:20:11.579344Z"
    }
   },
   "outputs": [],
   "source": [
    "data.to_csv(r'data-sample\\TaxiOD-Sample.csv',index = None)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.5"
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {},
   "number_sections": true,
   "sideBar": true,
   "skip_h1_title": false,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": false,
   "toc_position": {
    "height": "calc(100% - 180px)",
    "left": "10px",
    "top": "150px",
    "width": "211px"
   },
   "toc_section_display": true,
   "toc_window_display": true
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
